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A3STRACT 


i  tr 


,p  in  -the  mixture 
m 


r  —  t  : 

The_es£imation  of  mixing  proportions  p^,P2>. 
density  f(x)  «  I®  .  p  f . (x)  is  often  encountered  in  agricultural  remote 
sensing  problems  in  which  case  the' 'pi ’s  usually  represent  crop  proportions. 
In  these  remote  sensing  applications,  component  densities (T . (xj,  have 
typically  been  assumed  to  be  normally  distributed,  and  parameter  estima¬ 
tion  has  been  accomplished  using  maximum  likelihood  (ML)  techniques.  In 
this  paper  we  examine  minimum  distance  (MD)  estimation  as  an  alterna¬ 
tive  to  ML  where,  in  this  investigation,  both  procedures  are  based  upon 
normal  components.  Results  indicate  that  ML  techniques  are  superior 
to  MD  when  component  distributions  actually  are  normal,  while  MD  esti¬ 
mation  provides  better  estimates  than  ML  under  symmetric  departures  from 
normality.  When  component  distributions  are  not  symmetric,  however,  it 
is  seen  that  neither  of  these  normal  based  techniques  provides  satis¬ 
factory  results. 
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1.  Introduction 


A  common  objective  in  remote  sensing  is  the  estimation 

of  the  proportions  p.,p./**‘'P  in  the  mixture  density 

12  m 

f(x)  ■  P^fj^x)  +  p2f2(x)  +  ...  +  Pmfm(x)  (1.1) 

where  m  is  the  number  of  components (crops)  in  the  mixture 
and  for  component  i,f^(x)  is  a  (possibly  multivariate) 
density.  In  past  practice  this  density  has  been  assumed  to 
be  (multivariate)  normal  with  X  being  the  reflected  energy 
in  four  bands  of  the  light  spectrum,  certain  linear 
combinations  of  these  readings,  or  other  derived  "feature" 
variables.  Generally  the  parameter  estimation  has  been 
accomplished  using  maximum  likelihood  techniques.  In  this 
paper  we  examine  the  use  of  minimum  distance  estimation  as 
an  alternative  to  maximum  likelihood  and  we  will  compare 
the  performance  of  the  two  estimation  techniques  when 
dealing  with  mixtures  of  normal  and  of  non-normal  densities 
with  varying  amounts  of  separation.  We  will  focus  on  the 
mixture  of  two  univariate  distributions  given  by 

f(x)  «  pf^(x)  +  (l-p)f2(x) 


(1.2) 


We  are  also  assuming  that  only  data  from  the  mixture 
distribution  are  available.  Other  sampling  schemes  in  which 
training  samples  from  the  component  distributions  are  also 
available  have  been  discussed  by  Hosmer (1973) , 
Redner (1980) /  and  Hall (1981)  among  others. 

2.  Estimation  in  the  Mixture  of  Normals  Model 

In  this  section  we  will  assume  that  f^x)  and  (x)  in 

(1.2)  are  normal  densities  with  mean  and  variance  y  ,  and 
2 

u2#  <?2  respectively  where  it  is  assumed  that  all  five 
2  2 

parameters  uir  j^,  a2,  and  p  are  unknown.  Techniques  for 
estimating  these  parameters  will  be  discussed. 

(a)  Maximum  Likelihood 

Several  recent  articles  have  dealt  with  the  problem  of 

2 

obtaining  the  maximum  likelihood  estimates  of  y^  ,  ,  y2  , 

a2,  and  p  (Hasselblad(1966) ,  Day (1969),  Wolfe(1970), 

Hosmer (1975) ,  Fowlkes (1979) ,  Lennington  and  Rassbach(1979) , 
and  Redner (1980) . )  Since  the  likelihood  function 

L  =  f(x1)f(x2)  ...  f(xn)  (2.1 

where  n  is  the  sample  size,  is  not  a  bounded  function  in 
this  case  (see  Day(1969)),  the  objective  in  the  maximum 
likelihood  approach  is  to  find  a  local  maximum  of  L.  This 
maximum  is  usually  found  by  setting  the  partial  derivatives 
of  log(L)  with  respect  to  each  of  the  5  parameters  equal  to 
zero  and  solving  the  resulting  set  of  equations,  called  the 


likelihood  equations.  Since  closed  form  solutions  of  these 

equations  do  not  exist/  they  must  be  solved  using  iterative 

techniques.  Hasselblad(1966)  and  Wolfe (1969)  suggested  that 

these  equations  be  solved  by  taking  advantage  of  their 

fixed  point  form.  Redner(1980)  and  Redner  and  Walker (1982) 

have  pointed  out  that  this  fixed  point  technique  is 

essentially  an  application  of  the  EM  algorithm  (see 

Dempster,  Laird  and  Rubin(1977))  with  the  only  difference 

2 

being  that  using  the  EM  algorithm,  the  estimates  of  and 
a 2  at  step  k  involve  the  updated  kth  step  estimates  of  u1 
and  u  2 

Fowlkes (1979) ,  on  the  other  hand,  maximized  the 

likelihood  function  directly  by  utilizing  a  quasi-Newton 
method  for  minimizing  -log(L)  and  found  that  good  starting 
values  were  crucial  for  acceptable  performance. 
Hosmer(1975)  stated  that  using  the  likelihood  equations, 
starting  values  were  not  a  serious  problem  in  his 

experience.  In  order  to  determine  which  of  the  two 

techniques  seemed  preferable  in  our  simulation  studies  we 
replicated  simulations  performed  by  Fowlkes  in  which 
various  sets  of  poor  starting  values  were  used  to  initiate 
the  minimization  procedure.  We  simulated  realizations  from 
the  mixture  utilized  by  Fowlkes  and  estimated  the 

parameters  using  both  direct  maximization  and  the  EM 
algorithm.  The  results  of  our  simulations  indicate  that 
the  EM  algorithm  approach  is  preferable  and  hence  we  have 
used  this  technique  for  obtaining  MLEs  in  our  simulations. 
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(b) Minimum  Distance 

Although  ML  estimation  procedures  are  known  to  have 
certain  optimality  properties,  their  sensitivity  to 
violations  of  the  underlying  assumptions  is  also 
recognized.  The  development  of  estimation  procedures  which 
perform  well  even  under  moderate  deviations  from 
assumptions  has  been  a  topic  of  major  interest  in  recent 
literature.  One  of  these  robust  procedures  which  has 
received  recent  attention  is  that  of  minimum  distance (MD) 
estimation  introduced  by  Wolfowitz (1957) .  Parr  and 
Schucany (1980) ,  for  example,  have  shown  that  MD  techniques 
provide  robust  estimators  of  the  location  parameter  of  a 
symmetric  distribution.  Minimum  distance  estimation  has 
been  used  for  parameter  estimation  in  the  mixture  model  by 
Choi  and  Bulgren(1968)  and  MacDonald (1971)  with  some 
success  although,  to  our  knowledge,  the  question  of 
sensitivity  to  assumptions  in  this  setting  has  not  been 
addressed.  These  previous  authors  assumed  that  the 
parameters  of  the  component  distributions  were  known  and 
that  only  the  mixing  proportion (s)  was  to  be  estimated. 

In  order  to  briefly  describe  minimum  distance 

estimation,  we  let  x. ,X_, . . . ,X  denote  a  random  sample  from 

a  population  with  distribution  function  F  and  let  Fn 

denote  the  empirical  distribution  function,  i.e.  F  (x)«k/n 

n 

where  k  is  the  number  of  observations  less  than  or  equal 
to  x.  Further,  letV*  denote  a  family  of 

distributions  depending  on  the  possibly  vector  valued 


parameter  e.  The  MD  estimate  of  9  is  that  value  of  9  for 


which  the  distance  between  F„  and  H0  is  minimized.  It  is 

n  o 

not  necessary  that  Fearl  Of  course,  when  a  mixture  of  two 
normals  is  used  as  the  projection  family,  Hg  becomes 

x  i  y-vi  2  i  y-v 

HQ(x)  -  P  /  T=-  «  2  (  al  )  dy  +  (1-p)  J  e  2  (  a2 

_oo  ^  ^  _»  ^  °2 


Certain  considerations  become  obvious  at  this  point. 

First,  we  must  define  what  we  mean  by  the  "distance" 

between  two  distributions.  Several  such  distance  measures 

have  appeared  in  the  literature.  The  reader  is  referred  to 

the  article  by  Parr  and  Schucany (1980)  for  a  discussion  of 

these  measures.  For  our  purposes  we  have  chosen  the 
>  2 

Cramer-von  Mises  distance,  w  ,  between  distribution 
functions  G  ^  and  G 2  which  is  given  by 


W2  =  /[G1(x)-G2(x) ]2dG2(x) . 

—00 

In  our  setting  a  computing  formula  for  the  Cramer-von 


Mises  distance  between  F  and  H.  is  given  by 

n  o 


2  1  n  1—52 


where  Y  ^  is  the  ith  order  statistic.  The  similarity 

between  and  the  sum  of  squared  differences  between'  the 

empirical  distribution  function  F  and  H,  used  by  Choi  and 

n  o 

Bulgren(1968)  should  be  noted. 

Another  consideration  involves  the  minimization 

2 

procedure  to  be  employed  in  minimizing  W  .  Parr  and 


Schucany  used  the  IMSL  quasi-Newton  algorithm  ZXMIN.  Our 
comparisons  have  shown  , however,  that  the  IMSL  routine 
ZXSSQ  which  uses  Marquardt' s (1963)  method  for  minimizing  a 


sum  of  squares  was 

significantly 

faster, 

usually 

taking 

no 

more  than  half 

the  time 

required  by 

ZXMIN. 

In 

the 

simulation  studies 

reported 

in 

the  next 

section  we 

have 

used  the  Marquardt  minimization  procedure  when  calculating 

the  MDE.  It  should  be  noted  that  minimization  is  subject 

2  2 

to  the  constraints  <^>0  /  ^2— 0  '  anc3  1  *  An°ther  finding 
which  deserves  mention  before  proceeding  is  that  similar 
to  the  technique  we  have  chosen  for  calculating  the  MLE, 
the  MDE  has  the  desirable  property  that  it  is  relatively 
insensitive  to  starting  values. 


3.  Starting  Values 

In  order  for  the  estimators  discussed  in  the  previous 
chapter  to  be  used  in  practice,  starting  values  for  the 
iterative  procedures  must  be  provided.  We  have  chosen  to 
obtain  starting  values  in  this  two  component  univariate 
setting  using  a  partitioning  technique  which  is  very  easy 
to  implement.  In  the  discussion  to  follow  we  will  assume, 
without  loss  of  generality,  that  y1<v2.  This  technique 
involves  first  obtaining  the  initial  estimate  of  p, 
denoted  by  pQ,  and  then  estimating  the  remaining  four 
parameters  given  •  Under  the  current  implementation, 
only  the  9  values  .1,.2,...,.9  are  allowed  as  possible 
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values  for  pQ  .  For  each  allowable  value  of  pQ,  the  sample 
is  divided  into  two  subsamples  : 


Y1  '  Y2  '  •**/Yn] 
Y  ,  Y  ,  . . . ,  Y 
nl+1  nl+2 


where  Yi  is  the  ith  order  statistic  and  n1  is  npQ  rounded 

to  the  nearest  integer.  The  value  for  pQ  is  that  value  of 

2 

p  for  which  p  (1-p  ) (m^-m^  is  maximized,  where  nu  is 

the  sample  median  of  the  jth  subsample.  The  criterion  used 

here  is  a  robust  counterpart  to  the  classical  cluster 

analysis  procedure  of  selecting  the  clusters  for  which  the 

within  cluster  sum-of-squares  is  minimized.  It  is  easy  to 

show,  however,  that  the  within  cluster  sum-of-squares  is 

2 

minimized  in  the  two  cluster  case  when  p(l-p)  (x  -If  )  is 

JL  fc 

maximized,  where  x\  is  the  sample  mean  of  cluster  j  and 

and  p^n^n  with  n^^  the  number  of  sample  values  placed  in 

cluster  1.  Such  a  clustering  is  based  upon  a  cut-point, 

c  ,  for  which  all  sample  values  below  c  are  assigned  to 

the  cluster  associated  with  population  1.  It  must  be 

observed,  however,  that  due  to  the  overlap  between  the  two 

mixture  distributions,  some  sample  points  assigned  to 

cluster  1  may  be  from  population  2  and  some  observations 

from  population  1  may  be  in  cluster  2.  The  effect  of  this 

truncation  of  the  right  tail  in  population  1  is  that  the 

sample  mean  from  cluster  1  is  likely  to  underestimate 

2 

while  u2  likely  to  be  os  -»tir  „ed.  In  addition  and 

2  2  2 
a2  are  likely  to  be  underestimated  by  and  s2.  If  we 


assume  that  the  overlap  between  the  two  populations  is  not 

too  severe,  then  the  sample  values  in  cluster  1  to  the 

left  of  mj_  are  relatively  pure  observations  from 

population  1  in  which  case  m1  is  a  "good"  estimate  of  the 

population  mean  in  the  case  of  symmetric  distributions. 

This  reasoning  also  indicates  that  m,  and  m.  should 

12 

provide  better  estimates  of  y1  and  u2  than  would  and 
x  .  In  order  to  estimate  the  variances  of  the  component 

distributions  we  again  will  depend  upon  the  fact  that  the 

values  to  the  left  of  m^  and  to  the  right  of  m2  are  "pure" 
samples  from  populations  1  and  2  respectively.  Thu.j,  we 

will  use  only  this  portion  of  the  data  for  estimation  of 

the  sample  variances.  We  have  used  the  fact  that  the 
semi-interquartile  range  of  a  standard  normal  distribution 
is  .6745,  to  estimate  by 


2  “  rl 

al{0)  =  (  .6745 


(.25) 


2 

) 


where  r!^  is  the  percentile  from  the  jth  cluster, 


j-1,2.  Similarly,  oi 


[Cr. 


(.75) 


J2(0)  **  lli2  ~m2)/.6745]  . 

In  the  next  section  we  will  discuss  the  results  of  a 
major  simulation  investigation  comparing  ML  and  MD 
estimation.  In  these  simulations  the  iterative  techniques 


were  initiated  by  the  starting  values  as  discussed  in  the 
previous  paragraph.  A  preliminary  simulation  investigated 
the  performance  of  the  starting  values  described  here.  In 
this  preliminary  study  we  compared  the  convergence 
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initiated  from  these  starting  values  with  that  when  the 
iterative  procedures  are  started  at  the  true  parameter 
values.  The  convergence  from  these  two  starts  was  almost 
always  to  the  same  parameter  estimates,  a  result  which 
held  for  both  the  MLE  and  MDE.  For  this  reason  and  results 
to  be  shown  in  Section  4,  we  believe  this  starting  value 
procedure  to  be  adequate. 

4.  Simulation  Results 

In  the  previous  two  sections  we  have  discussed  ML  and 
MD  estimators  for  the  parameters  of  the  mixture  of  two 
distributions.  In  this  section  we  report  the  results  of 
simulations  designed  to  compare  these  two  estimators  when 
the  component  distributions  are  normal  and  when  they  are 
non-normal.  In  addition  we  have  made  our  comparisons  under 
varying  degrees  of  separation  between  the  two 
distributions.  All  computations  were  performed  on  the  CDC 
6600  at  Southern  Methodist  University. 

In  our  comparison  of  the  MDE  and  MLE  we  have  begun  by 
comparing  their  performance  when  the  normality  assumption 
is  valid,  i.e.,  when  the  component  distributions  actually 
are  normal.  We  should  mention  that  because  of  the 
optimality  properties  of  the  MLE  we  would  expect  that  the 
MLE  would  be  superior  in  this  situation.  Since  in  practice 
the  validity  of  the  normality  assumption  is  subject  to 
question,  we  are  also  very  interested  in  the  performance 
of  the  MDE  and  MLE  when  the  component  distributions  are 


not  normal.  To  this  end  we  have  simulated  mixtures  in 
which  the  component  distributions  are  distributed  as  a 
Student's  t  with  4  degrees  of  freedom.  We  simulated  500 
samples  of  size  n=100  from  mixtures  of  normal  and  of  t ( 4 ) 
components  for  each  of  the  following  parameter 
configurations : 

Mixing  proportion 
.25 
.50 
.75 


Variances 


2 

c1  =  2a 


2 

2 


The  nature  of  the  mixture  model  also  depends  on  the 
amount  of  separation  between  the  two  component 
distributions.  While,  for  sufficient  separation,  the 
mixture  model  has  a  characteristic  bimodal  shape, 
Behboodian(1970)  has  shown,  for  example,  that  a  sufficient 
condition  for  the  mixture  density  (of  two  normal 
components)  to  be  unimodal  is  that  |u1-u2l£2min(a1,a2) .  Of 
course,  in  this  situation,  parameter  estimation  is 
difficult. 

For  purposes  of  quantifying  this  separation  between 
the  components,  we  will  define  a  measure  of  "overlap" 
between  two  distributions.  Without  loss  of  generality  we 


assume  that 


population  1  is  centered  to  the  left  of 
population  2.  We  define  "overlap"  to  be  the  probability  of 
misclassif ication  using  the  rule: 

Classify  an  observation  x  as: 
population  1  if  x  <  x 
population  2  if  x  >  x£  , 

where  x  is  the  unique  point  between  y.  and  u-  such  that 

C  X  z 

pfl(xc>  =  (1-p)f2(xc) * 

We  have  based  our  current  study  on  "overlaps"  of  .03  and 

.10.  In  Figure  1  we  display  the  mixture  densities  associated 

with  normal  components  and  For  each  mixture,  the 

scaled  components  pf^x)  and  (l-p)f2(x)  are  also  shown.  Note 

that  the  densities  for  p*>.75  are  not  displayed  here  since 

when  it  follows  that  fp(x)  =f  1”p(u1+U2~x)where  fh(x) 

denotes  the  mixture  density  associated  with  a  mixing 

proportion  of  h.  Thus  the  shapes  of  the  densities  at  p=.75 

can  be  inferred  from  those  at  p=.25.  Likewise,  parameter 

estimation  for  p=.75  is  not  included  in  the  results  of  the 

2  2 

simulations  when  a ^ 

Although  both  estmation  procedures  provide  estimates  of 
all  5  of  the  parameters,  only  the  results  for  the  estimation 
of  p  will  be  shown  since  the  mixing  proportion  is  the 
parameter  of  primary  interest.  In  addition,  when  dealing 
with  the  non-normal  mixtures,  the  remaining  parameter 
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estimates  often  do  not  have  a  meaningful  interpretation.  In 
these  simulations  we  have  used  the  procedure  discussed  in 
the  previous  section  to  obtain  starting  values.  It  should  be 
noted  that  although  we  refer  to  mixtures  of  t(4) 
distributions  here,  they  are  actually  mixtures  of 
distributions  associated  with  the  random  variable  T'=aT+b, 
where  T  has  a  t(4)  distribution.  These  modifications  are 
made  in  order  to  obtain  the  desired  separation  and  variance 
ratios. 

In  Table  1  we  show  the  results  of  the  simulation 
comparing  the  performance  of  the  MLE  and  MDE.  In  particular, 

A 

let  pt  denote  the  estimate  of  p  for  the  ith  sample.  Then 
based  upon  the  simulations,  estimates  of  the  bias  and  MSE 
are  given  by: 

a  «»  A 

bias  -  —  I  (p.-p) 
ns  i=l  1 

a  i  ns  a  2 

MSE  =  ~  (p.-p)  , 


where  ns  is  the  number  of  samples.  It  should  be  noted  that 
nMSE  is  the  quantity  actually  given  in  the  table.  In 
addition,  we  provide  the  ratio 

„  MSE (MLE) 
a  9  MSE (MDE) 

as  an  efficiency  measure. 

Opon  viewing  the  results,  it  can  be  seen,  as  expected, 
that  the  bias  and  MSE  associated  with  the  MLE  were  generally 
smaller  than  those  for  the  MDE  when  the  components  were 


TABLE  1 

Simulation  Results  Comparing  MLE  and  MDE 

Sample  Size  “  100 
Number  of  replications  ■  500 


NORMAL 


Overlap  -  .10 

Bias  nMSE*  E 

MLE  MDE  MLE  MDE 

.052  .125  4.26  7.80  .55 


.25  .002  .084  2.25  5.30  .42 

.50  -.009  .005  2.41  2.79  .86 

.75  -.086  -.137  4.87  8.36  .58 


Overlap  “  .03 

Bias  nMSE  E 

MLE  MDE  MLE  MDE 

.008  .026  .54  1.09  .5 

.000  .001  .38  .42  .9 


.006 

.009 

-.002 

.027 

.008 

-.024 

4) 

1 

Overlap 

Bias  I 

MLE 

MDE 

.029 

.020 

-.005 

.000 

MLE 

MDE 

88 

.44 

2.00 

.47 

.27 

1.74 

normally  distributed.  This  relationship  between  the 
estimators  held  for  both  overlaps.  The  MLE  and  MDE  were 
quite  similar  at  p=.5  while  for  p=.25  and  p=.75  the 
superiority  of  the  MLE  is  more  pronounced. 

For  the  t(4)  mixtures  the  relationship  between  MDE  and 
MLE  is  reversed  in  that  the  MDE  generally  has  the  smaller 
bias  and  MSE.  The  superiority  of  the  MDE  in  this  case  is  due 
in  part  to  the  heavy  tails  in  the  t ( 4 )  mixture.  The  MLE 
often  interpreted  an  extreme  observation  as  being  the  only 
sample  value  from  one  of  the  populations  with  all  remaining 
observations  belonging  to  the  other.  Due  to  the  well  known 
singularities  associated  with  a  zero  variance  estimate  for  a 
component  distribution.  Day {1969),  we  were  concerned  that 
the  observed  behavior  of  the  MLE  was  due  to  the  fact  that  we 
did  not  constrain  the  variances  away  from  zero. 
However, simulation  results  in  which  equal  variances  were 
assumed  (which  removes  the  singularity)  and  also  those  which 
used  a  penalized  MLE  suggested  by  Redner(1980)  were  very 
similar  to  those  quoted  here. 

Although  the  MSE  is  a  widely  used  measure  among 
statisticians  for  assessing  the  performance  of  an  estimator, 
the  practical  implications,  for  example,  of  an  estimator 
having  an  MSE  three  times  larger  than  that  for  another 
estimator,  may  not  be  immediately  apparent.  Recall  that  each 
MSE  quoted  in  Table  1  is  based  upon  500  estimates  of  p.  In 
order  to  provide  a  better  appreciation  for  the  practical 
impact  of  differences  in  MSE,  in  Figure  2  we  display 
histograms  of  the  500  estimates  of  p  associated  with  three 
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different  MSEs  in  the  table.  The  true  value  of  p  in  each 
case  is  p=.5.  It  is  obvious  that  as  the  MSE  increases,  the 
performance  of  the  estimator  deteriorates.  Notice  that  the 
MSE  for  Figure  2(c}  is  approximately  three  times  greater 
than  the  MSE  associated  with  Figure  2(a),  while  the  MSE  for 
Figure  2{b)  is  aprroximately  twice  that  for  Figure  2(a). 
Thus,  from  these  histograms,  an  intuitive  feel  for 
efficiency  ratios  of  E=2  and  E»3  can  be  obtained. 

A  very  surprising  result  is  that  the  starting  values 
obtained  using  the  procedure  outlined  in  Section  3  produced 
estimators  which  were  competitive  with  both  the  MLE  and  MDE. 
In  fact,  for  both  the  normal  and  t(4)  mixtures,  the  MSEs 
associated  with  the  starting  values  were  lower  than  those 
for  the  MDE  and  MLE  for  every  parameter  configuration 
associated  with  an  overlap  of  .10.  At  an  overlap  of  .03, 
however,  the  starting  values  estimates  were  generally  poorer 
than  those  for  the  MDE  and  MLE. 

5.  Mixtures  of  Asymmetric  Distributions 

The  simulation  results  of  the  previous  section  focus  on 
the  performance  of  the  MLE  and  MDE  under  deviations  from  the 
assumption  of  normality.  However,  the  t(4)  distribution  is 
symmetric,  and  recent  studies  have  indicated  that  there  is 
often  a  substantial  asymmetry  in  the  component  distributions 
for  variables  of  interest  in  agricultural  remote  sensing.  A 
Monte  Carlo  examination  of  the  performance  of  the  MDE  and 
MLE,  assuming  normal  components,  when  in  fact  the  component 
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distributions  were  asymmetric,  was  performed,  and  the 
results  of  this  examination  will  be  discussed  in  this 
section. 

For  purposes  of  our  examination,  we  simulated  mixtures 

2 

of  x  (9)  distributions  with  p».5.  In  these  simulations  the 

two  distributions  differed  from  each  other  only  by  a 

location  shift.  Actually  the  component  distribution  to  the 

left  is  x2(9)  while  that  to  the  right  is  that  of  a  "shifted" 
2 

x  (9)  with  origin  no  longer  at  0.  This  shift  was  varied  to 

provide  overlaps  of  .01,  .05,  and  .10.  Since  our  estimation 

procedures  involve  a  normality  assumption,  we  used  the  means 

2 

and  variances  of  the  two  component  x  (9)  distributions  and 
the  true  mixing  proportions  as  our  starting  values.  The 
problem  of  obtaining  starting  values  from  the  data  in  this 
case  is  being  examined.  In  Table  2  we  display  the  results  of 
this  simulation.  Only  when  the  two  component  distributions 
were  widely  separated  (overlap-. 01)  do  the  two  procedures 
provide  reasonable  results.  However,  when  the  two  chi-square 
distributions  are  not  widely  separated,  both  estimators  tend 
to  seriously  underestimate  p.  In  Figure  3  we  display  the 
three  mixture  distributions  on  which  these  simulations  were 
based.  We  see  there  that  it  is  no  surprise  that  the  estimate 
of  p  is  less  than  .5,  especially  for  p».10.  Both  estimation 
procedures  view  this  as  a  mixture  of  normals,  and  therefore 
make  the  reasonable  interpretation  that  the  density  to  the 
left  has  a  smaller  variance  and  a  mixing  proportion  less 
than  .5.  These  results  point  out  the  impact  which  skewed 
distributions  can  have  on  the  proportion  estimation  in  the 
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mixture  model  when  normal  mixtures  are  assumed. 

Current  investigation  into  this  area  centers  around 
modifying  the  estimation  procedures  by  assuming  that  the 
underlying  component  distributions  belong  to  some  family  of 
distributions  whose  members  can  be  either  symmetric  or 
asymmetric  depending  on  parameter  configurations.  At  the 
present  time/  the  Weibull  distribution  is  being  examined 
concerning  its  usefulness. 

6.  Concluding  Results 

We  believe  that  the  results  of  the  preceding  sections 
are  of  sufficient  substance  to  motivate  further  research  in 
the  area  of  MD  estimation  in  the  mixture  model.  Our  results 
indicate  that  the  MDE  is  indeed  more  robust  than  the  MLE  in 
the  sense  that  it  is  less  sensitive  to  symmetric  departures 
from  the  underlying  assumption  of  normality  of  component 
distributions.  Several  areas  for  future  investigation  have 
already  been  identified  in  addition  to  the  asymmetric 
components  problem  discussed  in  Section  5. 

First/  simulations  similar  to  the  ones  presented  here 

I 

should  be  performed  without  the  assumption  of  only  two 
populations  in  the  mixture.  The  performance  of  the  MDE  and 
MLE  should  be  compared  when  the  number  of  populations  is 
known  and  larger  than  two.  In  addition  the  applicability  of 
the  MDE  to  the  problem  of  estimating  the  number  of 
populations  also  warrants  investigation.  We  plan  to  examine 
these  possibilities. 


Second,  the  problem  of  applying  the  MDE  to  the  multivariate 
setting  is  of  interest.  Preliminary  indications  are  that 
such  an  extension  will  be  possible. 

Third,  the  choice  of  distance  measure  in  the  MDE  is  a 
topic  of  interest.  Our  results  are  not  meant  to  imply  that 
W  is  optimal. 

Finally,  the  MDE  and  MLE  must  ultimately  be  compared  on 
real  data.  Several  related  practical  considerations  have  not 
yet  been  investigated.  For  example,  when  applying  these 
estimators  to  LANDSAT  data,  the  number  of  iterations  allowed 
must  be  small  due  to  time  constraints.  In  the  simulations 
described  here,  these  constraints  were  not  imposed  and 
iteration  was  allowed  to  continue  until  convergence  was 
obtained.  The  performance  of  the  MDE  and  MLE  under 
convergence  restrictions  should  be  examined. 
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accomplished  using  maximum  likelihood  (ML)  techniques.  In  this  paper  we 
examine  minimum  distance  (MD)  estimation  as  an  alternative  to  ML  where, 
in  this  investigation,  both  procedures  are  based  upon  normal  components 
Results  indicate  that  ML  techniques  are  superior  to  MD  when  component 
distributions  actually  are  normal,  while  MD  estimation  provides  better 
estimates  than  ML  under  symmetric  departures  from  normality.  When 
component  distributions  are  not  symmetric,  however,  it  is  seen  that 
neither  of  these  normal  based  techniques  provides  satisfactory  results. 


